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Abstract 

The ectoparasitic Monogenea comprise a major part of the obligate parasitic f latworm diversity. Although genomic adaptations to 
parasitism have been studied in the endoparasitic tapeworms (Cestoda) and flukes (Trematoda), no representative of the Monogenea 
has been investigated yet. We present the high-quality draft genome of Gyrodactyl us salaris, an economically important monogenean 
ectoparasite of wild Atlantic salmon (Salmo salar). A total of 1 5,488 gene models were identified, of which 7, 1 02 were functionally 
annotated. The controversial phylogenetic relationships within the obligate parasitic Neodermata were resolved in a phylogenomic 
analysis using 1,719 gene models (alignment length of > 500,000 amino acids) for a set of 1 6 metazoan taxa. The Monogenea were 
found basal to the Cestoda and Trematoda, which implies ectoparasitism being plesiomorphic within the Neodermata and strongly 
supports a common origin of complex life cycles. Comparative analysis of seven parasitic flatworm genomes identified shared 
genomic features for the ecto- and endoparasitic lineages, such as a substantial reduction of the core bilaterian gene complement, 
including the homeodomain-containing genes, and a loss of the piwi and vasa genes, which are considered essential for animal 
development. Furthermore, the shared loss of functional fatty acid biosynthesis pathways and the absence of peroxisomes, the latter 
organelles presumed ubiquitous in eukaryotes except for parasitic protozoans, were inferred. The draft genome of G. salaris opens for 
future in-depth analyses of pathogenicity and host specificity of poorly characterized G. salaris strains, and will enhance studies 
addressing the genomics of host-parasite interactions and speciation in the highly diverse monogenean flatworms. 
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Introduction 

Obligate parasitic flatworms constitute one of the three larg- 
est groups of metazoan parasites of vertebrates (the others 
being the nematodes and arthropods) and include many spe- 
cies of medical and veterinary importance. Schistosomes, for 
example, are responsible for about 300,000 human deaths 
annually (van der Werf et al. 2003), and a range of tapeworms 
cause morbidity and mortality in humans and domestic live- 
stock (e.g., Echinococcus, Taenia, and Diphyllobothrium). 

The phylogenetic relationships of the phylum 
Platyhelminthes and the major lineages within have been in- 
tensively discussed for decades. Initially, the main challenge 
was the basal position within Bilateria and the potential para- 
phyly with acoelomorph flatworms. Recent studies (Baguna 
and Riutort 2004; Hejnol et al. 2009; Philippe et al. 2011) 
provided strong evidence for Platyhelminthes and 



Acoelomorpha being separate phyla, with Acoelomorpha 
basal within Bilateria. Accordingly, Platyhelminthes would in- 
clude the remaining free-living and parasitic flatworms and 
represent derived protostomian lophotrochozoans (Baguna 
and Riutort 2004). 

Genome-wide data are currently only available for a few 
platyhelminth species including the planarian Schmidtea med- 
iterranea (Robb et al. 2008), a model for stem cell biology 
(Gentile et al. 2011), and some important human parasites, 
including the flukes Schistosoma mansoni (Berriman et al. 
2009), 5. japonicum (Schistosoma japonicum Genome 
Sequencing and Functional Analysis Consortium 2009), and 
Clonorchis sinensis (Wang et al. 201 1), and the tapeworms 
Echinococcus multilocularis, E. granulosus (Zheng et al. 201 3), 
7aen/a solium, and Hymenolepis microstoma (Tsai et al. 201 3). 
No genome-wide data have yet been published for a 
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monogenean species. The obligate parasitic flatworms form 
the monophyletic Neodermata, a well-established lineage 
based on the name giving Neodermis. This larval secondary 
syncytial tegument is believed to be the key innovation of 
Neodermata that allowed for their immense radiation 
(Littlewood 2006). The Neodermata currently comprise the 
Monogenea (Monopisthocotylea and Polyopisthocotylea), 
Cestoda (tapeworms; Eucestoda and Cestodaria), and 
Trematoda (flukes; Aspidogastrea and Digenea) (reviewed in 
Olson and Tkach (2005)). The Monogenea are characterized by 
a direct ectoparasitic lifestyle, whereas the Cestoda and 
Trematoda engage in endoparasitic life cycles of varying com- 
plexity. Traditional views on the evolution of Neodermata based 
on morphology (Janicki 1 920), as well as early molecular studies 
using 18S ribosomal DNA markers (Littlewood et al. 1999), 
supported an early divergence of the Trematoda and a sister 
group relationship of Monogenea and Cestoda (reviewed in 
Lockyer et al. 2003). More recently, a sister group relationship 
between Cestoda and Trematoda was supported by nucleotide 
sequences of the combined 18S and 28S ribosomal genes 
(Lockyer et al. 2003), mitochondrial DNA (Park et al. 2007; 
Perkins et al. 201 0), and microRNA loci (Fromm et al. 201 3). 

Understanding the phylogenetic relationships within the 
Neodermata is an essential prerequisite for understanding 
the evolutionary origins of endo- and ectoparasitism as well 
asof the complex lifecyclesof flatworms. Based on the recently 
published genomesof the tapeworms E. multilocularis, E. gran- 
ulosus, T. solium, and H. microstoma, Tsai et al. (2013) pre- 
sented several genomic traits interpreted as adaptations to 
the parasitic lifestyle in flatworms. Their conclusions, however, 
must be considered preliminary because no representative of 
the Monogenea had been included in the analyses, hindering a 
comprehensive assessment of the issue. This study targets the 
genome of G. salaris Malmberg 1 957, a significant pathogen 
of Atlantic salmon {Salmosalar) causing severe ecological and 
economic damage in Norway and Russia (Harris et al. 201 1 ). 

We present the draft genome of G. salaris assembled from 
combined Roche 454 FLX Titanium and lllumina GAII NGS 
reads, and provide the first genomic reference for a 
monogenean flatworm. The controversial phylogenetic 
relationships within Neodermata were addressed using a 
large-scale phylogenomic approach. Furthermore, recently 
reported gains and losses of genetic traits as either specific 
for parasitic flatworms or tapeworms (Tsai et al. 2013) were 
assessed. With the monogenean draft genome at hand, their 
significance for the evolution of ecto- and endoparasitism in 
flatworms is discussed. 

Materials and Methods 

Sample Collection, DNA Extraction, and Next-Generation 
Sequencing 

Genomic DNA was extracted from a pooled sample of 
-15,000 individuals of G. salaris obtained from an 



experimentally reared parasite population on Atlantic 
Salmon (Salte et al. 2010) using the E.Z.N.A. Tissue DNA kit 
(Omega Bio-Tek) following the Tissue DNA-Spin Protocol. 
Library preparation and the Roche 454 FLX Titanium and 
lllumina GAII sequencing were performed at the Norwegian 
Sequencing Centre (Oslo, Norway). The lllumina reads were 
trimmed and end-clipped to a phred score of 33 using cus- 
tom scripts and subsequently error corrected using the error 
correction tool of the SOAPdenovo2 software (Luo et al. 
201 2). For the use in Overlap Layout Consensus (OLC) assem- 
blers, the data set was digitally normalized to a k-mer cover- 
age of 20 using tools from the khmer package (Pell et al. 
2012). 

De Novo Assembly of G. salaris Genomic NGS Reads and 
Removal of Nontarget Sequences 

In order to optimally assemble the draft genome for G. salaris 
a range of de Bruijn graph (DBG) assemblers, that is, Velvet 
1 .2.07 (Zerbino and Birney 2008), ABySS 1 .3.4 (Simpson et al. 
2009), SOAPdenovo 1.0.5 (Li et al. 2010b) were utilized on 
the lllumina reads. Furthermore, hybrid assemblies of 454 and 
normalized lllumina GAII data were performed using the OLC 
assemblers Newbler 2.6 (Margulies et al. 2005) and Celera 7.0 
(Myers et al. 2000). Sequence assemblies obtained by DBG 
and OLC approaches were quality assessed using standard 
assembly metrics such as N50, total number of contigs, and 
total length of the assembly (supplementary file S2, table S1, 
Supplementary Material online). Guanine-Cytosine (GC)- 
coverage scatter plots (Kumar and Blaxter 2011) were pro- 
duced for the assemblies (fig. 1 and supplementary file S1, 
fig. S1 , Supplementary Material online) and the conjunction of 
all information was used to choose the best assembly (see 
supplementary file S1, Section A for details, Supplementary 
Material online). Putative nontarget contigs were removed 
prior to subsequent analyses (supplementary file S1, Section 
A for details, Supplementary Material online). Completeness 
of the gene space in the draft was assessed using CEGMA 2.3 
(Parra et al. 2007, 2009). Trimmed lllumina reads were 
mapped back to the conservative draft assembly using BWA 
(Li and Durbin 2009) in order to identify the putative G. salaris 
readpool. K-mer frequencies (20-mer) were subsequently cal- 
culated for this readpool using respective tools from the 
khmer package (Pell et al. 2012) in order to estimate the 
G. salaris genome size based on the k-mer frequency distribu- 
tion (Li et al. 2010a) (see supplementary file S1, Section A, 
Supplementary Material online). Single nucleotide polymor- 
phisms (SNPs) were called using Freebayes (Garrison and 
Marth 2012) and SNP density and transition/transversion 
ratio were calculated using VCFtools (Danecek et al. 
2011). To minimize false positives, SNPs called at positions 
with low (<20x) and high coverage (>100x) were not 
considered. 
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Fig. 1. — Aggregate properties (coverage and GC content) of contigs obtained by Celera 7.0 as (a) scatter plot and (£>) heat map. Colored areas in 
(a) illustrate the aggressive cleaning strategy adopted in this study, that is, the removal of putative nontarget contigs of coverage <40x (red; putative host 
contamination) and GC content >50% (blue; putative bacterial contamination) (see supplementary file S1, Section A for details, Supplementary Material 
online). 



Gene Prediction and Annotation 

A G. salaris specific library of repeat families, including simple 
repeats, interspersed repeats, and satellite DNAs was identi- 
fied from the assembled draft genome using Repeat Modeler 
(Smit and Hubley 2008-2010). Subsequently, the automated 
annotation pipeline MAKER 2.26 (Holt and Yandell 201 1 ) was 
used for structural annotation (supplementary file S1, Section 
B for details, Supplementary Material online). Repetitive DNA 
was masked using RepeatMasker (Smit et al. 1 996-201 0) and 
the G. salaris specific repeat library. MAKER then reconciled 
homology-based physical evidence with the results of purely 
ab initio predicted gene models obtained by means of 
the gene predictors GeneMark (Lomsadze et al. 2005), 
SNAP (Korf 2004), and Augustus (Stanke and Waack 2003) 
(see supplementary file S1, Section B for further details, 
Supplementary Material online). All gene models were 
screened for known domains using lnterProScan5RC6 
(Quevillon et al. 2005) and subjected to a BLAST search 
(Altschul et al. 1997) against a custom build of NCBI's nr pro- 
tein database (restricted to Metazoan proteins). B2G4pipe 2.5 
(Conesa et al. 2005) was used to establish functional annota- 
tions, that is, to assign putative biological functions and Gene 
Ontology terms (Ashburner et al. 2000) to the G. salaris gene 
models. The initial functional annotations were reconciled 
with the InterProScan results and mapped onto metabolic 
pathways included in the KEGG database (Kanehisa et al. 
2012) using blast2GO v. 2.6.6 (Conesa et al. 2005; Gotz 
et al. 2008). For comparative reasons, the same procedure 
was also performed for the gene sets of 5. mediterranea, 
E. multilocularis, S. mansoni, Caenorhabditis elegans, Lottia 
gigantea, and Helobdella robusta. 



Analyses of Gene Models 

Basic statistics for obtained G. salaris gene models, including 
average intron/exon lengths and number of introns, were cal- 
culated using custom scripts. The functionally annotated gene 
models were screened for the presence/absence of selected 
genomic traits that were specifically addressed in a recent 
study on cestode genomes (Tsai et al. 2013) (see supplemen- 
tary file S1, Section D for details, Supplementary Material 
online). 

Gene Orthology Search, Cluster Selection, and 
Phylogenomic Analyses 

The protein complements of 16 metazoan taxa (supplemen- 
tary file S2, table S2, Supplementary Material online) were 
searched for reciprocal best hits using BLAST (Altschul et al. 
1997), and the results were subsequently used for identifying 
orthologous gene clusters by using the MCL algorithm (van 
Dongen 2000) following Fischer et al. (2011). An inflation 
parameter of 2.1 was used as previously described (Smith 
et al. 201 1). For subsequent phylogenetic analyses of meta- 
zoan relationships, a set of putative orthologous gene clusters 
was selected based on criteria used in previous studies (Dunn 
et al. 2008; Hejnol et al. 2009) with minor modifications: 
Clusters of putative orthologous genes were required to 1) 
include at least eight taxa (i.e., >50%), 2) include at least 
one taxon from each of the four platyhelminth groups, 3) 
have a mean of <5 sequences per taxon, and 4) have a 
median of <2 sequences per taxon. Sequences of each ac- 
cepted cluster were aligned using Clustal Omega (Sievers et al. 
2011). Ambiguously aligned regions were removed using 
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Aliscore and Alicut (Kuck 2009; Misof and Misof 2009). 
Maximum-likelihood (ML) phylogenetic trees were inferred in- 
dividually for each gene cluster with RAxML 7.3.2 (Stamatakis 
2006) using the best-fitting amino acid substitution model as 
determined by the RAxML amino acid substitution model se- 
lection Perl script. Individual tree robustness was evaluated 
with 100 bootstrap pseudo replicates. Clusters of putative 
orthologous genes were then divided into classes 1) contain- 
ing no paralogs, 2) containing only monophyletic paralogs 
(i.e., putative in-paralogs), or 3) containing paraphyletic para- 
logs (i.e., putative out-paralogs), using custom scripts. The 
latter clusters were omitted from further analyses. Clusters 
containing in-paralogs were subjected to monophyly masking, 
that is, all but one representative paralog per taxon were re- 
moved from the alignment. For each such cluster, multiple 
alignments including only one paralog at a time were con- 
structed using Clustal Omega (Sievers et al. 2011) and 
trimmed using Aliscore and Alicut (Kuck 2009; Misof and 
Misof 2009). Custom scripts were used to assess all align- 
ments and select the final alignment for each cluster using 
only the particular paralog that yielded the longest alignment 
after trimming, that is, caused the smallest number of ambig- 
uously aligned positions in the alignment. If several paralogs 
ranked equally, one was chosen randomly. Clusters without 
paralogs (Class 1) and the monophyly masked clusters (Class 
2) were concatenated to a supermatrix using FASconCAT 
(Kuck and Meusemann 2010), after removing any clusters 
yielding alignments shorter than 100 amino acid positions. 
ML analyses were performed on the final matrix using 
RAxML 7.8.3 (Stamatakis 2006). The data set was partitioned 
according to the individual genes and the best-fitting amino 
acid substitution model, as determined by the RAxML amino 
acid substitution model selection Perl script, was applied to 
each partition. ML tree robustness was assessed using 100 
bootstrap pseudo replicates. Bayesian inference (Bl) was per- 
formed using the CAT-GTR model (Lartillot and Philippe 2004) 
as implemented in Phylobayes-MPI 1 .3b (Lartillot et al. 2013). 
Bl was run for 2,000 generations in five parallel chains. The 
initial 500 generations were discarded as burn in, well after 
convergence (MaxDiff values below 0.3). A 50% consensus 
tree was computed from the remaining 1 ,500 trees from each 
chain. Bl tree robustness was assessed using posterior proba- 
bilities. In addition, ML and Bl were inferred for a concate- 
nated alignment of a reduced set of genes with the strongest 
phylogenetic signal (average bootstrap support >80). This 
method was recently proposed to reduce potential incongru- 
ence in phylogenomic analyses (Salichos and Rokas 201 3). The 
same study emphasized that bootstrap values can be mislead- 
ing in large phylogenomic data sets and introduced internode 
certainty (IC) as a more sensitive metric for the detection of 
incongruence. However, only genes that are represented in all 
taxa can currently be taken into account for the calculation of 
IC. For the above described data set of 16 metazoan taxa, IC 
would be based on <50% of the genes. To overcome this 



limitation, we compiled a further data set targeting exclusively 
lophotrochozoan taxa. Clusters of orthologous genes were 
required to 1) include all 10 lophotrochozoan taxa, 2) have 
a mean of <5 sequences per taxon, and 3) have a median of 
<2 sequences per taxon. Nonlophotrochozoan taxa were 
removed and individual alignments were subsequently pro- 
cessed (aligned, trimmed, and monophyly masked) as de- 
scribed above. The data set was finally reduced to 
exclusively genes with strongest phylogenetic signal (in this 
case an average bootstrap support threshold of 90 was feasi- 
ble), and phylogenetic analyses were performed as described 
above. IC was inferred using RAxML 8.0.3 (Stamatakis 2006). 
See supplementary file S1, Section C for more details, 
Supplementary Material online. The scripts used for the pre- 
paration of data were made freely available (https://github. 
com/chrishah/phylog/, last accessed April 30, 2014). 

Results 

Genome Sequencing and Gene Prediction 

Five de novo assemblers were tested and the result obtained 
by Celera 7.0 (Myers et al. 2000) was selected as the best 
conservative draft assembly for G. safaris (fig. 1) based on 
basic assembly metrics (e.g., N50) and the inferred robustness 
against putative nontarget reads (see supplementary file S1, 
Section A, and supplementary file S2, table S1 for details, 
Supplementary Material online). Furthermore, unlike other as- 
semblers Celera 7.0 used both Roche 454 FLX Titanium and 
lllumina GAII reads in a simultaneous hybrid assembly. The 
shotgun sequencing strategy combined 0.6 Gb Roche 454 
FLX Titanium (average read length 51 4 bp) and 19.8 Gb 
lllumina GAII (read length 76 bp; paired end insert size 
167 ±40 bp) reads. After removal of putative nontarget con- 
tigs, the draft genome comprised 6,075 contigs (>200bp; 
N50 18.4kb; average coverage ~126x) totaling 67.4 Mb 
(GC content 33.84%). The gene space was estimated to be 
>89.11% complete using CEGMA (Parra et al. 2007). The 
two most conservative CEGMA reference gene sets were 
recovered at a rate of >95% in the draft genome of G. safaris. 
A total of 25.68% of the assembly was identified as repetitive, 
and the genome size was estimated as -120 Mb (see supple- 
mentary file S1, Section A, Supplementary Material online). 
The assembled sequences of the G. safaris draft genome were 
deposited in GenBank (BioProject ID PRJNA244375) and can 
furthermore be obtained from the Gyrodactylus safaris 
genome project's web page (http://invitro.titan.uio.no/gyro 
dactylus/, last accessed April 30, 2014). We identified a total 
of 93,185 SNPs (SNP density 1 .55 SNPs/kb) with a transition/ 
transversion ratio of 0.92. Structural gene annotation pro- 
duced 15,488 gene models, of which 8,637 contained func- 
tional domains. Functional annotation was achieved for 7,102 
gene models (supplementary file S1, Section D and fig. S4, 
Supplementary Material online). The average G. safaris gene 



1 1 08 Genome Biol. Evol. 6(5): 1 105-1 117. doi:10.1093/gbe/evu078 Advance Access publication April 14, 2014 



Genomics of Flatworms and Neodermata 



GBE 




Lgigantea M0LLU5CA 

0.88 



0,3 

Fig. 2. — Phylogenetic relationships among the major groups of parasitic flatworms. Bl topology (CAT-GTR model) based on a supermatrix containing the 
312 gene models (135,085 amino acid positions) with the strongest phylogenetic signal (average bootstrap >90). Full circles represent nodes with posterior 
probability = 1 and boostrap support of 100. Numbers next to internodes represent IC, and are based on a subset of 31 1 genes available for the full 
taxon set. 



model was 2.7 kb long, contained four exons with an average 
length of 289-bp coding for a 906-bp transcript. Overall, the 
average intron length was 659 bp. However, when looking at 
introns according to their position within genes, it turned out 
that the average intron length decreased significantly 
(Pearson's product-moment correlation p=-0.88, 
P=8.19 x 10~ 9 ) with increasing ordinal position (see supple- 
mentary file S1, fig. S1 1 , Supplementary Material online). 

Phylogenomic Analysis 

The orthology search for the metazoan data set identified 
38,568 orthologous gene clusters, out of which 3,121 
passed the selection criteria (see Materials and Methods sec- 
tion). A total of 548 orthologous gene clusters contained no 
paralogs, and 1,303 clusters contained only monophyletic 
paralogs. The supermatrix for phylogenomic analyses of the 
metazoan data set contained 1,719 gene clusters >100 
amino acids (aa), and comprised a total of 519,023 aa posi- 
tions (supplementary files S3 and S2, table S4, Supplementary 
Material online). Within the Platyhelminthes, the monophyly 
of the Neodermata with a basal position of the ectoparasitic 
Monogenea and a sister group relationship of the endopara- 
sitic Trematoda and Cestoda was highly supported in all anal- 
yses (fig. 2). With respect to deep metazoan relationships, the 
phylogenetic tree obtained using Bl (supplementary fig. S6, 
Supplementary Material online) is in agreement with previ- 
ously published results and recovers the monophyly of 



Chordata, Ecdysozoa, Lophotrochozoa, and Platyhelminthes 
with high statistical support. The phylogenomic reconstruc- 
tions using a subset of the metazoan data set, that is, includ- 
ing exclusively the 173 gene models with the strongest 
phylogenetic signal (average bootstrap support of gene trees 
>80; supplementary file S2, table S5 and supplementary file 
S4, Supplementary Material online) consistently recovered the 
same tree topology (see supplementary file S1 , fig. S5£> for ML 
and fig. S6 for Bl, Supplementary Material online). The ML tree 
for the full metazoan data set differed from the Bl tree in the 
position of the nematode C. elegans, which was recovered in 
a sister group relationship to the Platyhelminthes. This is likely 
an artifact caused by long branch attraction (Felsenstein 1 987; 
see supplementary file S1, Section C for more detailed discus- 
sion, Supplementary Material online). The lophotrochozoan 
data set of genes with strongest phylogenetic signal (average 
bootstrap support >90) comprised a total of 312 genes and 
1 35,085 aa positions (supplementary files S5 and S2, table S6, 
Supplementary Material online). Both Bl and ML analyses of 
this data set strongly supported a basal position of the 
Monogenea and a sister relationship of Cestoda and 
Trematoda (fig. 2). IC values are based on 311 individual 
gene trees. 

Genome-Wide Reduction of Bilaterian Genes 

We identified a total of 1,490 orthologous gene clusters, 
which contained representatives of all 15 bilaterian taxa 
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Fig. 3. — Venn diagram illustrating the number of gene clusters devoid 
of orthologous genes of one to three parasitic flatworm groups, while 
present in all other nonneodermatan Bilateria in the data set. 



(supplementary file S2, table S2, Supplementary Material 
online). A total of 315 clusters were found to lack G. salaris 
genes, while containing genes of each of the eight nonneo- 
dermatan bilaterians. Similarly, 277 and 230 gene clusters 
were lacking gene models of Cestoda and Trematoda, respec- 
tively. Integrating these results indicates that 158 bilaterian 
core genes have been lost throughout the Neodermata (fig. 
3). For the other bilaterians included in the data set much 
lower numbers of lineage-specific losses were observed, that 
is, 23 in L. gigantea (Mollusca), 36 in H. robusta (Annelida), 99 
in S. mediterranea (Turbellaria), 1 1 0 in C. elegans (Nematoda), 
46 in Daphnia pulex (Arthropoda), and 8 in Homo sapiens 
(Chordata). 

Specific Genomic Features in Parasitic Flatworms 

The gene complement of G. salaris was analyzed with respect 
to recently proposed genomic adaptations to parasitism in 
flatworms (Tsai et al. 2013). A total of 55 DEAD/DEAH box 
helicase containing gene models were identified for G. salaris. 
However, the ubiquitous stem-cell-specific RNA helicase vasa 
(DDX4) was absent. One gene model was found orthologous 
to the structurally very similar PL10 (DDX3) gene, and clus- 
tered within a clade consisting exclusively of parasitic flatworm 
PL10 (supplementary file S1, fig. S8, Supplementary Material 
online). The G. salaris homeodomain-containing gene comple- 
ment comprised 79 gene models, which could be assigned to 
52 families within nine classes (supplementary file S1, fig. S7 
and supplementary file S2, table S7, Supplementary Material 
online). All classes reported earlier for tapeworms and flukes 
were represented, and 23 of the 24 homeodomain genes 
previously described as lost in flukes and tapeworms (Tsai 
et al. 2013) were absent in G. salaris as well; however, Vsx 
was represented in the draft genome. A further 1 0 losses were 
detected in G. salaris, three of them shared with tapeworms 



{Gbx, Bah, and Rax), four with flukes {Mkx, Zeb, Cux, and 
Hox3), and three appeared specific to the monogenean line- 
age (Meox, En, and Pou2) (fig. 4). The p/W/ subfamily was not 
detected in the G. salaris genome. However, two argonaute 
gene models were found in the G. salaris gene complement, 
which clustered within the AGO family and a further clade 
comprising exclusively flatworm sequences, respectively (sup- 
plementary file S1, fig. S9, Supplementary Material online). 

The intron length distribution of G. salaris gene models had 
two prominent modes at -32 bp (subsequently referred to as 
short mode) and -58 bp (long mode) (supplementary file S1, 
fig. S10 compares the intron length distributions of four flat- 
worm taxa, Supplementary Material online). A total of 14,342 
introns (31 %) were shorter than 1 00 bp, whereas the remain- 
ing 31,312 introns (69%) exceeded 100 bp (ranging up to 
-20,000 bp in length), which resulted in an average intron 
length of 659 bp. The ratio of the occurrence of introns of 
the two modes changed with their ordinal position within the 
genes. Ordinal intron positions 1 and 2 were dominated by 
short mode introns, whereas the long mode introns domi- 
nated at ordinal positions >3 (supplementary file S1, 
fig. S12, Supplementary Material online). Echinococcus multi- 
locularis orthologs of G. salaris genes, which contained introns 
falling into either the short or the long length mode, were 
more likely themselves to contain introns of the corresponding 
short/long mode in E. multilocularis than expected by chance 
(see supplementary file S1, Section D and supplementary file 
S2, table S8, Supplementary Material online). 

In total, eight heat shock protein (Hsp) 70 gene models 
were identified in the G. salaris draft genome. All Hsp70 sub- 
families previously reported for flatworms (Tsai et al. 2013), 
that is, Hsp110, ER Hsp70, mitochondrial Hsp70, and flat- 
worm cytosolic Hsp70 were present, and each subfamily 
was represented by one to two gene copies (supplementary 
file S1, fig. S16, Supplementary Material online). None of the 
Hsp70 genes detected for G. salaris clustered with tapeworm- 
specific Hsp70 genes reported by Tsai et al. (2013). 

The four peroxisome marker proteins PEX3, PEX10, PEX12, 
and PEX19, proposed earlier as unequivocal in silico indicators 
for the presence of peroxisomes (Schluter et al. 2006), were 
not detected in the G. salaris draft genome, nor in the gene 
complement of 5. mansoni or E. multilocularis. However, we 
detected indication of all four marker proteins in the genomes 
of the free-living flatworm 5. mediterranea, as well as the 
mollusc L gigantea, the annelid H. robusta, and the nematode 
C. elegans. 

The majority of genes encoding key enzymes necessary for 
fatty acid synthesis was absent from the G. salaris gene com- 
plement. Only the acetyl-CoA carboxylase, involved in fatty 
acid precursor production, could be detected (supplementary 
file S1, fig. S13, Supplementary Material online). The enzyme 
complement of fatty acid metabolism and -elongation, how- 
ever, was found to be largely complete (supplementary 
file S1, figs. S14 and S15, Supplementary Material online). 
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Furthermore, a range of fatty acid binding and -transport pro- 
teins were identified (supplementary file S2, table S12, 
Supplementary Material online). 

Discussion 

In recent years, several genomes of platyhelminths have 
become available. Species causing severe human diseases, 
such as the blood flukes 5. mansoni (Berriman et al. 2009) 
and S. japonicum {Schistosoma japonicum Genome 
Sequencing and Functional Analysis Consortium 2009), were 
the first to be studied at a genomic level, and recently four 
tapeworm genomes of public health interest were published 
(Tsai et al. 2013; Zheng et al. 2013), forming the basis for 
comparative genomic analyses in order to identify specific evo- 
lutionary adaptations to the obligate parasitic lifestyle in flat- 
worms. Accordingly, Tsai et al. (2013) proposed a number of 
genomic features for parasitic flatworms; however, their con- 
clusions must be considered preliminary because the phyloge- 
netic relationships within Neodermata were not resolved, and 
there was no representative of the Monogenea, the third 
major parasitic group, included in their comparative genomic 
analysis. 

The G. salaris Genome 

The size of the G. salaris genome, calculated as ~1 20 Mb, was 
found in the same order of magnitude as other platyhelminth 
genomes, including the sequenced tapeworms, which range 
between 115 and 151 Mb (Tsai et al. 2013; Zheng et al. 
2013). Using shallow sequencing data, the genome size of 
the closely related sibling species Gyrodactylus thymalli was 
recently estimated at -170 Mb (Hahn et al. 2013). This esti- 
mate, however, can only be interpreted as an upper limit be- 
cause the authors had only little information on the ratio of 



host and other nontarget DNA in the analyzed total DNA ex- 
tract. Although genome assembly for metazoans with rela- 
tively small genome size has improved greatly over the past 
years to the point where it has become a democratized, 
bottom-up enterprise (Kumar et al. 2012), a number of chal- 
lenges remain for small invertebrates. The very small amounts 
of genomic DNA that can be extracted from single individuals 
may be insufficient for the needs of some NGS applications. 
Low DNA yield is also an issue for Gyrodactylus as individual 
parasites contain only about 1,000 cells (Bakke et al. 2007). 
Whole-genome amplification (WGA) approaches may over- 
come this obstacle (Rodrigue et al. 2009), but in turn may 
imply other problems such as amplification of nontarget 
DNA, which can confuse genome assembly. Furthermore, 
bias in the WGA may leave large regions of the genome 
unsequenced (Rodrigue et al. 2009). In order to provide 
enough genomic DNA for NGS sequencing of G. salaris — in 
particular for the Roche 454 FLX Titanium pyrosequencing — 
we chose the strategy of extracting DNA from a pooled 
sample. The sampled population was reared experimentally 
under controlled conditions (Salte et al. 2010), and ge- 
netic heterozygosity was expected to be low. Indeed, the ob- 
served SNP density was less than half of the recently described 
1/288 bp (3.47 SNPs/kb) for S. japonicum (Liu et al. 2006). As 
Gyrodactylus feeds on host mucus and epithelial cells approx- 
imately every 15-30 min (Cable et al. 2002), host cells and 
bacteria of the host skin epiflora will always be present in the 
gut of the parasites. The inevitable coextraction of this nontar- 
get DNA further complicates genome assembly of G. salaris. 
As the host genome is about one order of magnitude larger 
(estimated to 3 x 10 9 bp (Davidson et al. 2010)) than that of 
G. salaris, even minute amounts of host cells in the total DNA 
sample could lead to significant contamination on the NGS 
read level. To overcome this challenge, we applied an in silico 
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cleaning of the NGS data downstream to sequencing. 
We adopted a recently proposed strategy for the separation 
of genomes of symbiotic organisms (Kumar and Blaxter 201 1) 
that is based on differing aggregate properties of parasite 
and nontarget genomes. The bioinformatic cleaning of the 
G. safaris assembly was performed aggressively to minimize 
nontarget contigs in the draft genome data set, and we 
consider the removal of -50% (see supplementary file S1, 
Section A, Supplementary Material online) of all reads appro- 
priate in the light of the potential sample contamination. An 
assembly N50 of around the median gene length of an organ- 
ism is sufficient to expect about 50% of all genes to be unin- 
terruptedly present in a draft (Yandell and Ence 2012) — a 
simple rule of thumb suggests the median gene length in 
the genome of G. salaris (estimated as ~ 120 Mb) to be 
about 2.5 kb (interpolated after fig. 1 in Yandell and Ence 
(2012)). The actual N50 of >18kb thus suggests an uninter- 
rupted representation of the majority of genes. The gene 
space in the draft genome of G. salaris was assessed as 
being roughly 90% complete based on the presence of a 
set of core genes of varying level of evolutionary conservatism 
(Parra et al. 2007, 2009). However, it is very likely that the 
gene space is even more complete because CEGMA tends to 
underestimate the completeness of divergent genomes. It has 
thus been suggested earlier that only the most conservative 
groups of genes should be used as indicators for completeness 
in such cases (Parra et al. 2009). In line with this reasoning, the 
two most conservative CEGMA reference gene sets were re- 
covered at a rate of >95% in the draft genome of G. salaris. 
The 1 5,488 predicted gene models for G. salaris were well 
within the range reported for other parasitic flatworms, for 
example, 1 0,345 for £ multilocularis (Tsai et al. 201 3), 1 1 ,325 
for £ granulosus (Zheng et al. 2013), 10,852 for 5. mansoni 
(Berriman et al. 2009), and 1 6,258 for C. sinensis (Wang et al. 
2011). Although the published information on general gene 
statistics for previously sequenced flatworms is slightly ambig- 
uous and rather fragmentary, the results for G. salaris gene 
models indicated a reduction in average gene length and num- 
ber of exons per gene when compared with other parasitic 
flatworms (supplementary file S2, table S3, Supplementary 
Material online). The average intron size calculated for 
G. salaris was similar to that previously reported for tape- 
worms, whereas average protein- and exon length appeared 
similar across flatworms in general. The prediction of gene 
models for G. salaris was based on the information available 
in public databases (see supplementary file S2, table S2, 
Supplementary Material online) combined with ab initio 
gene prediction. Although purely ab initio gene predictors 
are prone to overprediction, MAKER2 appeared relatively 
robust in respect to false-positive predictions (Holt and 
Yandell 2011). Whether the set of genes predicted for 
G. salaris is indeed inflated remains to be clarified by means 
of transcriptome sequencing and expression studies. 
However, the largely concordant gene statistics when 



compared with other parasitic flatworms (see above), and 
the identification of functional domains and functional anno- 
tation in 55.8% and 45.9% of the predicted gene models, 
respectively, imply the inference of a high-quality set of gene 
models for G. salaris. In comparison, in the recently released 
genome of £ granulosus 40.3% of the gene models could be 
functionally annotated (Zheng et al. 2013). 

Phylogenomics of Platyhelminthes 

The phylum Platyhelminthes comprises the free-living 
Turbellaria and the obligate parasitic Monogenea, Cestoda, 
and Trematoda, the latter three groups forming the 
Neodermata. Although there was little dispute about the 
monophyly of Neodermata in the phylum, the phylogenetic 
relationships within the Neodermata have been discussed for 
many years. The phylogenomic analyses unambiguously sup- 
ported the monophyly of the Platyhelminthes and the obligate 
parasitic Neodermata with high statistical support. This study 
provides the first genome wide approach to assess the inter- 
relationships of the obligate parasitic flatworms, and within 
the monophyletic Neodermata all analyses consistently recov- 
ered an early divergence of the ectoparasitic Monogenea and 
a sister group relationship of the endoparasitic Trematoda and 
Cestoda (more detailed discussion of the IC values can be 
found in supplementary file S1, Section C, Supplementary 
Material online). The data therefore soundly reject a sister 
group relationship between Monogenea and Cestoda, previ- 
ously united as the Cercomeromorphae (Janicki 1920), and 
support recent results obtained by analyzing combined ribo- 
somal 18S and 28S data sets (Lockyer et al. 2003), mitochon- 
drial genomes (Park et al. 2007; Perkins et al. 2010), and 
microRNA complements (Fromm et al. 201 3). We cannot com- 
ment on a possible paraphyly of the Monogenea in respect to 
the two main lineages within, the Monopisthocotylea and 
Polyopisthocotylea (Mollaret et al. 1997; Justine 1998). 
Nevertheless, the draft genome of the monopisthocotylean 
G. salaris provides a paramount foundation for future studies 
targeting this question. The basal position of the Monogenea 
within the Neodermata implies that the endoparasitism of 
Cestoda and Trematoda is a monophyletic but derived lifestyle 
within Platyhelminthes and that the ectoparasitism in 
Monogenea represents the plesiomorphic state of parasitism 
in Neodermata. As pointed out by Park et al. (2007), a sister 
group relationship between Cestoda and Trematoda also sug- 
gests that the transition from direct life cycles on a single host 
(as in Monogenea) to complex life cycles with multiple hosts (as 
in Cestoda and Trematoda) was a single evolutionary event. 

Genomic Features of Obligate Parasitism in Flatworms 

With the phylogenetic relationships within Neodermata re- 
solved the draft genome of G. salaris allows for assessing 
whether recently proposed adaptive genomic traits for 
flukes and tapeworms (Tsai et al. 2013) are related to the 
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evolution of endoparasitism or represent in fact synapomor- 
phies of the Neodermata. The list of the adaptive genomic 
features included among others 1) the absence of vasa, 2) a 
substantially reduced homeobox gene complement, 3) the 
loss of the piwi subfamily and the emergence of a fluke- 
and tapeworm-specific subfamily of argonaute proteins, 4) 
the innovation of a bimodal intron length distribution in tape- 
worms, 5) an expansion of the Hsp70 family in tapeworms, 6) 
the absence of essential peroxisome enzymes, and possibly 
the absence of peroxisomes in general, and 7) the lack of 
key enzymes for fatty acids synthesis. 

The ubiquitous stem-cell specific RNA helicase vasa (DDX4) 
was absent in tapeworms and flukes. Instead two copies of 
the structurally very similar PL10 (DDX3) gene were found in 
both lineages, and it was speculated that a duplication of PL 1 0 
(DDX3) in free-living flatworms facilitated a functional take- 
over of vasa by PL10, followed by a subsequent loss of the 
then functionally redundant vasa gene in the common ances- 
tor of tapeworms and flukes (Tsai et al. 2013). Both vasa as 
well as the parasite-specific PL10 have been reported for the 
monogenean Neobenedenia girellae (Ohashi et al. 2007), an 
ectoparasite of marine fish, which, like G. salaris, belongs to 
the Monopisthocotylea. However, in G. salaris we found only 
one copy of the parasitic flatworm-specific PL10 protein. Vasa 
was absent, and the loss of the gene is thus not a specific 
feature of tapeworms and flukes. Whether PL10 has function- 
ally taken over for vasa in G. salaris remains to be tested, but it 
is noteworthy that knock-down experiments in N. girellae in- 
dicated a more important role of PL10 in germ cell formation 
and fertilization than vasa (Ohashi et al. 2007). The reason 
why two monopisthocotylean monogeneans differ in this re- 
spect is unclear but may reflect the suggested paraphyly of the 
group (Perkins et al. 2010). 

The homeodomain complement of invertebrates is very 
variable. Recently substantial expansions were reported in 
the lophotrochozoans H. robusta (Annelida, Hirudinea), 
Capitella telata (Annelida, Polychaeta), and L gigantea 
(Mollusca, Gastropoda) with 181, 121, and 111 homeodo- 
main-containing proteins, respectively (Simakov et al. 2013). 
In contrast, a shared loss of 24 homeobox gene families in 
flukes and tapeworms, and an additional loss of 10 further 
families unique to tapeworms, when comparing the respec- 
tive homeodomain complements to a set of 96 homeobox 
gene families expected for the most recent common ancestor 
of the Bilateria, were reported (Tsai et al. 2013). A similar 
reduction in the homeodomain-containing gene complement 
was confirmed for G. salaris. The data imply that the majority 
of homeodomain gene losses must have occurred early in the 
evolution of Neodermata. There was, however, also an indi- 
cation of a partly convergent history of homeobox gene family 
losses in the three neodermatan lineages (fig. 4). Although 
direct experimental evidence is still lacking, the substantial 
loss of homeodomain genes may be an adaptation to an 
obligate parasitic lifestyle. 



The piwi gene subfamily was also reported lost in cestodes 
and trematodes. Along with the gain of a specific argonaute 
subfamily, these have been proposed as tapeworm- and 
fluke-specific adaptations to parasitism (referred to as group 
4 argonaute clade by Tsai et al. 2013). However, the piwi gene 
subfamily was absent also from the draft genome of G. salaris 
and we conclude therefore that the loss of this particular gene 
subfamily is a further synapomorphy of Neodermata. In addi- 
tion, genes belonging to the fluke- and tapeworm-specific 
group 4 argonaute clade (Tsai et al. 2013) were also present 
in the genome of G. salaris. As reported earlier (Zheng 

2013) , we confirmed a 5. mediterranea argonaute protein 
clustering within this clade (supplementary file S1, fig. S9, 
Supplementary Material online), and thus conclude that this 
represents a flatworm-specific expansion, rather than a phe- 
nomenon restricted to endoparasitic neodermatans. The loss 
of piwi in Neodermata raises of course many questions with 
respect to the gene's canonical functions. Piwi and argonaute 
proteins are essential for maintaining genome integrity in stem 
cells in a process referred to as p/W/'-interacting RNA (piRNA) 
pathway; in short they repress the mobilization of transpos- 
able elements (TEs). The question how flatworms keep up 
genome integrity without piwi and vasa was addressed re- 
cently (Skinner et al. 2014). The authors suggested that 
parasitic platyhelminths may have evolved a germline, stem 
cell-specific endogenous, noncanonical siRNA pathway for TE 
silencing that took over the piRNA pathway (Skinner et al. 

2014) . 

Intron lengths in eukaryotes usually vary within a broad 
range. However, the length distribution is frequently charac- 
terized by at least one distinct mode commonly below 1 00 bp: 
These putative optima for minimal intron length vary between 
species. Despite the occurrence of such microintron modes, 
the average intron length is usually substantially higher, for 
example, 663 in E. multilocularis (Tsai et al. 2013), 1,411 in 
Drosophila melanogaster, and 372 in C. elegans (Yu et al. 
2002). Yu et al. (2002) argued that such minimal introns 
may enhance the rate at which messenger RNA is exported 
from the cell nucleus. A bimodal intron length distribution was 
earlier suggested as an evolutionary innovation, that is, a 
newly gained genomic trait, in tapeworms (Tsai et al. 2013), 
and the authors speculated about modifications of the splicing 
machinery and/or the nucleus export machinery as being 
causal adaptations for this phenomenon. However, a bimodal 
distribution was also found in G. salaris with modes at -32 
and -58 bp, respectively (supplementary file S1, Section D and 
fig. S10, Supplementary Material online), indicating that this 
feature is more common than previously suggested. 

Although the mode positions vary between the species, we 
found evidence for congruence between G. salaris and 
E. multilocularis gene models in respect to the gene composi- 
tion in their modes. Gyrodactylus salaris gene models contain- 
ing introns falling either into the short (2,232) or the long 
intron length modes (2,199) were more likely to be 
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orthologous to E. multilocularis gene models containing the 
corresponding short/long mode introns (1,793 and 2,156, 
respectively) than expected if gene models were randomly 
distributed in the two species (see supplementary file S1, 
Section D for more details, Supplementary Material online). 
In G. salaries, we furthermore found remarkable differences in 
the number of short/long mode introns depending on their 
ordinal position within the gene models. Although the short 
mode introns are most prevalent at intron positions 1 and 2 of 
the gene models, the long mode introns are more evenly dis- 
tributed with respect to ordinal position, but become domi- 
nant in introns of ordinal position >3 (see supplementary file 
S1, fig. S12, Supplementary Material online). However, aver- 
age intron length in G. salaris gene models was found to sig- 
nificantly decrease with ordinal position (see supplementary 
file S1, fig. S1 1, Supplementary Material online). Such a neg- 
ative correlation between intron length and intron ordinal po- 
sition was reported earlier for a range of organisms (Marais 
et al. 2005; Gazave et al. 2007; Zhang and Edwards 2012). 
For primates and Drosophila, it has furthermore been reported 
that levels of evolutionary constraint vary across classes of in- 
trons of different length. Although longer introns appeared 
more divergent in primates (Gazave et al. 2007), there was a 
negative correlation between intron length and the level of 
divergence in Drosophila (Haddrill et al. 2005). Whether or not 
intron length in G. salaris is correlated with evolutionary con- 
straints needs to be addressed in future studies. Such more 
comprehensive studies may also investigate whether the bi- 
modal intron length distribution and the biased occurrence of 
short and long mode introns with respect to their ordinal po- 
sition are adaptive. In this context, it also needs to be assessed 
whether the bimodal intron length distribution is a synapo- 
morphy of Neodermata, a hypothesis that currently implies a 
secondary loss of this genomic trait in Trematoda as observed 
for 5. mansoni (Tsai et al. 2013) (see also supplementary file 
S1, Section D and fig. S10, Supplementary Material online). 

Peroxisomes are largely ubiquitous organelles in eukaryotes 
involved in a variety of metabolic processes and essential in 
fatty acid — and antioxidant metabolism (Lazarow and Fujiki 
1985; Schluter et al. 2007). The absence of many genes as- 
sociated with peroxisomes including four unequivocal perox- 
isome marker proteins (Schluter et al. 2006) led to the 
conclusion that tapeworms and flukes may lack peroxisomes 
(Tsai et al. 201 3), a feature also reported for the Apicomplexan 
parasites Plasmodium falciparum, P. yoelii, Cryptosporidium 
parvum, Toxoplasma gondii, and Theileria parva (Schluter 
et al. 2006, 2007). Our results indicate the absence of perox- 
isomes also in G. salaris and confirm the findings for endopar- 
asitic flatworms, whereas the detection of all four marker 
proteins in 5. mediterranea, H. robusta, L gigantea, and 
C. elegans, respectively, denotes the presence of functional 
peroxisomes in the latter species. The inability to synthesize 
fatty acids de novo has been previously reported for tape- 
worms (Tsai et al. 2013) and flukes (Berriman et al. 2009; 



Wang et al. 201 1) as several key enzymes were lacking, and 
only a few proteins involved in fatty acid precursor production 
were detected. It was therefore proposed that endoparasitic 
flatworms scavenge essential fatty acids from their hosts, and 
the high expression of fatty acid binding proteins (Tsai et al. 
2013) seemed in line with the hypothesis. However, a similar 
strategy might also be adopted by the ectoparasitic G. salaris. 
The genomic data indicate that the loss of peroxisomes and 
the substantial reduction in the fatty acid biosynthesis path- 
way are characteristic for an adaptive strategy to the obligate 
parasitic lifestyle of Neodermata. Further studies will be 
needed to test this hypothesis and to unravel the underlying 
biochemical details. 

The reductions in specific genes/gene families described 
above are in line with the more general observation that a 
large number of expected bilaterian gene models are absent in 
Neodermata (fig. 3). Although such comparisons rely on many 
assumptions and may in addition suffer from a somewhat 
uncertain definition of the set of expected gene models, the 
losses of genes in G. salaris, as well as in tapeworms and 
flukes, appeared much higher than in other bilaterian line- 
ages. A similar loss of expected loci has recently been reported 
for conserved microRNAs (Fromm et al. 201 3). Whether or not 
the loss of genes correlates with the loss of further specific 
biochemical pathways, such as the fatty acid biosynthesis, or 
represents a genome-wide trend needs to be addressed in 
more detail in future studies. However, this study identified 
a large number of genomic traits shared in obligate parasitic 
flatworms in spite of the obvious ecological differences 
of ecto- and endoparasitic lifestyles. These traits are thus 
likely to have arisen as an adaptation to parasitism, already 
in the common ancestor of neodermatan flatworms. The 
high-quality draft genome of G. salaris facilitates now to in- 
clude also a representative of the Monogenea in future com- 
parative genomics studies addressing important biological 
questions such as the genetic basis for ecto- and endoparasitic 
lifestyles as well as the direct (single-host) and the complex 
(multiple-hosts) lifecycles represented in the Monogenea and 
the Trematoda+Cestoda sistergroup, respectively. 

Impact of the G. salaris Genome 

Notably, G. salaris is a notorious parasite of Atlantic salmon 
and other European salmonids and is frequently referred to as 
"the Salmon killer." The parasite causes significant ecological 
and economic damage in infected watercourses. In recent 
years, substantial effort has been directed toward an under- 
standing of strain-specific pathogenicity and transmission be- 
havior in G. salaris (reviewed in Bakke et al. 2007). However, 
most studies have relied on intergenic ribosomal spacers and/ 
or mitochondrial genes as molecular markers (Hansen et al. 
2003, 2007; Meinila et al. 2004), which have proven insuffi- 
cient for resolving these issues, and the lack of diagnostic 
markers thus remains a major limitation in G. salaris strain 
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identification and characterization. The provided draft 
genome of G. salaris can now facilitate the development of 
diagnostic markers, and provides a platform for future inves- 
tigations of the genomic basis of the varying pathogenicity, 
host specificity, and -preference of various strains of G. salaris. 
In general, the first monogenean reference genome is 
expected to enhance future studies on the genomics of 
host-parasite interactions and speciation in the highly diverse 
monogenean flatworms, and will provide a platform for the 
development of drug-based control strategies specifically tar- 
geting essential biochemical functions in monogenean para- 
sites. The gene models presented in this study may also allow 
the identification of "speciation genes," that is, genes that 
retain reproductive isolation (Wu and Ting 2004) after rapid 
evolution following the initial utilization of a new host species. 
Host switching is assumed to play a major role in the diversi- 
fication of monogenean species (Huyse and Volckaert 2005) 
and other parasitic organisms in general (Sorenson et al. 
2003). 

Supplementary Material 

Supplementary files S1-S5 are available at Genome Biology 
and Evolution online (http://www.gbe.oxfordjournals.org/). 
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